Overview

Dataset statistics

Number of variables13
Number of observations2891
Missing cells48
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory293.7 KiB
Average record size in memory104.0 B

Variable types

Categorical4
Numeric8
DateTime1

Alerts

updated_at has constant value "2022-12-10 21:24:34.727669"Constant
city has a high cardinality: 567 distinct valuesHigh cardinality
male_population is highly overall correlated with female_population and 4 other fieldsHigh correlation
female_population is highly overall correlated with male_population and 4 other fieldsHigh correlation
total_population is highly overall correlated with male_population and 4 other fieldsHigh correlation
number_of_veterans is highly overall correlated with state and 6 other fieldsHigh correlation
foreign_born is highly overall correlated with male_population and 4 other fieldsHigh correlation
state is highly overall correlated with median_age and 3 other fieldsHigh correlation
state_code is highly overall correlated with state and 3 other fieldsHigh correlation
count is highly overall correlated with male_population and 4 other fieldsHigh correlation
median_age is highly overall correlated with state and 1 other fieldsHigh correlation
average_household_size is highly overall correlated with state and 1 other fieldsHigh correlation
city is uniformly distributedUniform

Reproduction

Analysis started2022-12-10 13:30:55.321834
Analysis finished2022-12-10 13:31:14.967163
Duration19.65 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

city
Categorical

HIGH CARDINALITY
UNIFORM

Distinct567
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Memory size22.7 KiB
Bloomington
 
15
Columbia
 
15
Springfield
 
15
Jackson
 
10
Norwalk
 
10
Other values (562)
2826 

Length

Max length47
Median length25
Mean length9.1030785
Min length2

Characters and Unicode

Total characters26317
Distinct characters56
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st rowSilver Spring
2nd rowQuincy
3rd rowHoover
4th rowRancho Cucamonga
5th rowNewark

Common Values

ValueCountFrequency (%)
Bloomington 15
 
0.5%
Columbia 15
 
0.5%
Springfield 15
 
0.5%
Jackson 10
 
0.3%
Norwalk 10
 
0.3%
Lakewood 10
 
0.3%
Arlington 10
 
0.3%
Fayetteville 10
 
0.3%
Rochester 10
 
0.3%
Albany 10
 
0.3%
Other values (557) 2776
96.0%

Length

2022-12-10T21:31:15.170962image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city 98
 
2.5%
san 72
 
1.9%
beach 51
 
1.3%
valley 40
 
1.0%
saint 40
 
1.0%
santa 40
 
1.0%
new 34
 
0.9%
fort 29
 
0.8%
park 25
 
0.7%
hills 24
 
0.6%
Other values (598) 3392
88.2%

Most occurring characters

ValueCountFrequency (%)
a 2515
 
9.6%
e 2366
 
9.0%
n 2076
 
7.9%
o 2007
 
7.6%
r 1595
 
6.1%
i 1579
 
6.0%
l 1569
 
6.0%
t 1297
 
4.9%
s 1045
 
4.0%
959
 
3.6%
Other values (46) 9309
35.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 21452
81.5%
Uppercase Letter 3853
 
14.6%
Space Separator 959
 
3.6%
Dash Punctuation 28
 
0.1%
Other Punctuation 25
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2515
11.7%
e 2366
11.0%
n 2076
9.7%
o 2007
9.4%
r 1595
 
7.4%
i 1579
 
7.4%
l 1569
 
7.3%
t 1297
 
6.0%
s 1045
 
4.9%
d 697
 
3.2%
Other values (18) 4706
21.9%
Uppercase Letter
ValueCountFrequency (%)
C 475
12.3%
S 405
 
10.5%
B 293
 
7.6%
P 248
 
6.4%
L 243
 
6.3%
M 237
 
6.2%
R 229
 
5.9%
A 227
 
5.9%
F 170
 
4.4%
W 157
 
4.1%
Other values (14) 1169
30.3%
Other Punctuation
ValueCountFrequency (%)
' 20
80.0%
/ 5
 
20.0%
Space Separator
ValueCountFrequency (%)
959
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25305
96.2%
Common 1012
 
3.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2515
 
9.9%
e 2366
 
9.3%
n 2076
 
8.2%
o 2007
 
7.9%
r 1595
 
6.3%
i 1579
 
6.2%
l 1569
 
6.2%
t 1297
 
5.1%
s 1045
 
4.1%
d 697
 
2.8%
Other values (42) 8559
33.8%
Common
ValueCountFrequency (%)
959
94.8%
- 28
 
2.8%
' 20
 
2.0%
/ 5
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26314
> 99.9%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2515
 
9.6%
e 2366
 
9.0%
n 2076
 
7.9%
o 2007
 
7.6%
r 1595
 
6.1%
i 1579
 
6.0%
l 1569
 
6.0%
t 1297
 
4.9%
s 1045
 
4.0%
959
 
3.6%
Other values (44) 9306
35.4%
None
ValueCountFrequency (%)
ü 2
66.7%
ó 1
33.3%

state
Categorical

Distinct49
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size22.7 KiB
California
676 
Texas
273 
Florida
222 
Illinois
 
91
Washington
 
85
Other values (44)
1544 

Length

Max length20
Median length13
Mean length8.4268419
Min length4

Characters and Unicode

Total characters24362
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMaryland
2nd rowMassachusetts
3rd rowAlabama
4th rowCalifornia
5th rowNew Jersey

Common Values

ValueCountFrequency (%)
California 676
23.4%
Texas 273
 
9.4%
Florida 222
 
7.7%
Illinois 91
 
3.1%
Washington 85
 
2.9%
Arizona 80
 
2.8%
Colorado 80
 
2.8%
Michigan 79
 
2.7%
North Carolina 70
 
2.4%
Virginia 70
 
2.4%
Other values (39) 1165
40.3%

Length

2022-12-10T21:31:15.525420image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california 676
21.2%
texas 273
 
8.6%
florida 222
 
7.0%
new 141
 
4.4%
carolina 94
 
2.9%
illinois 91
 
2.9%
washington 85
 
2.7%
arizona 80
 
2.5%
colorado 80
 
2.5%
north 80
 
2.5%
Other values (44) 1366
42.8%

Most occurring characters

ValueCountFrequency (%)
a 3650
15.0%
i 3033
12.4%
o 2207
 
9.1%
n 2073
 
8.5%
r 1675
 
6.9%
l 1435
 
5.9%
s 1390
 
5.7%
e 1136
 
4.7%
C 894
 
3.7%
f 681
 
2.8%
Other values (36) 6188
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20882
85.7%
Uppercase Letter 3183
 
13.1%
Space Separator 297
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3650
17.5%
i 3033
14.5%
o 2207
10.6%
n 2073
9.9%
r 1675
8.0%
l 1435
 
6.9%
s 1390
 
6.7%
e 1136
 
5.4%
f 681
 
3.3%
t 580
 
2.8%
Other values (14) 3022
14.5%
Uppercase Letter
ValueCountFrequency (%)
C 894
28.1%
M 341
 
10.7%
T 317
 
10.0%
N 276
 
8.7%
F 222
 
7.0%
I 210
 
6.6%
A 148
 
4.6%
W 130
 
4.1%
O 119
 
3.7%
V 70
 
2.2%
Other values (11) 456
14.3%
Space Separator
ValueCountFrequency (%)
297
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24065
98.8%
Common 297
 
1.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3650
15.2%
i 3033
12.6%
o 2207
 
9.2%
n 2073
 
8.6%
r 1675
 
7.0%
l 1435
 
6.0%
s 1390
 
5.8%
e 1136
 
4.7%
C 894
 
3.7%
f 681
 
2.8%
Other values (35) 5891
24.5%
Common
ValueCountFrequency (%)
297
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24362
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3650
15.0%
i 3033
12.4%
o 2207
 
9.1%
n 2073
 
8.5%
r 1675
 
6.9%
l 1435
 
5.9%
s 1390
 
5.7%
e 1136
 
4.7%
C 894
 
3.7%
f 681
 
2.8%
Other values (36) 6188
25.4%

median_age
Real number (ℝ)

Distinct180
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.494881
Minimum22.9
Maximum70.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:16.114784image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum22.9
5-th percentile28.8
Q132.8
median35.3
Q338
95-th percentile42.6
Maximum70.5
Range47.6
Interquartile range (IQR)5.2

Descriptive statistics

Standard deviation4.4016167
Coefficient of variation (CV)0.12400709
Kurtosis4.164544
Mean35.494881
Median Absolute Deviation (MAD)2.6
Skewness0.64661844
Sum102615.7
Variance19.37423
MonotonicityNot monotonic
2022-12-10T21:31:16.469545image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35.7 50
 
1.7%
33.4 48
 
1.7%
33.1 45
 
1.6%
36.8 45
 
1.6%
34.1 45
 
1.6%
34.5 44
 
1.5%
38.1 42
 
1.5%
34.6 40
 
1.4%
35.3 40
 
1.4%
36 40
 
1.4%
Other values (170) 2452
84.8%
ValueCountFrequency (%)
22.9 5
0.2%
23 4
 
0.1%
23.5 5
0.2%
23.6 5
0.2%
23.9 5
0.2%
24.2 5
0.2%
25.5 5
0.2%
26 5
0.2%
26.1 5
0.2%
26.2 10
0.3%
ValueCountFrequency (%)
70.5 3
0.1%
48.8 5
0.2%
47.9 4
0.1%
47.6 4
0.1%
47.4 5
0.2%
47.3 5
0.2%
47 5
0.2%
46.9 5
0.2%
46.8 4
0.1%
45.9 4
0.1%

male_population
Real number (ℝ)

Distinct593
Distinct (%)20.5%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean97328.426
Minimum29281
Maximum4081698
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:16.876311image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum29281
5-th percentile32290
Q139289
median52341
Q386641.75
95-th percentile296902.6
Maximum4081698
Range4052417
Interquartile range (IQR)47352.75

Descriptive statistics

Standard deviation216299.94
Coefficient of variation (CV)2.2223717
Kurtosis209.81379
Mean97328.426
Median Absolute Deviation (MAD)15991
Skewness12.735597
Sum2.810845 × 108
Variance4.6785663 × 1010
MonotonicityNot monotonic
2022-12-10T21:31:17.249719image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40601 10
 
0.3%
33993 10
 
0.3%
135455 5
 
0.2%
518317 5
 
0.2%
149547 5
 
0.2%
60989 5
 
0.2%
60704 5
 
0.2%
83640 5
 
0.2%
55550 5
 
0.2%
42100 5
 
0.2%
Other values (583) 2828
97.8%
ValueCountFrequency (%)
29281 5
0.2%
29995 5
0.2%
30007 5
0.2%
30193 4
0.1%
30758 5
0.2%
30799 2
 
0.1%
30844 5
0.2%
30890 5
0.2%
31019 5
0.2%
31205 5
0.2%
ValueCountFrequency (%)
4081698 5
0.2%
1958998 5
0.2%
1320015 5
0.2%
1149686 5
0.2%
786833 5
0.2%
741270 5
0.2%
721405 5
0.2%
693826 5
0.2%
639019 5
0.2%
518317 5
0.2%

female_population
Real number (ℝ)

Distinct594
Distinct (%)20.6%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean101769.63
Minimum27348
Maximum4468707
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:17.636622image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum27348
5-th percentile34163
Q141227
median53809
Q389604
95-th percentile315853.35
Maximum4468707
Range4441359
Interquartile range (IQR)48377

Descriptive statistics

Standard deviation231564.57
Coefficient of variation (CV)2.2753799
Kurtosis227.63305
Mean101769.63
Median Absolute Deviation (MAD)15771
Skewness13.320445
Sum2.9391069 × 108
Variance5.3622151 × 1010
MonotonicityNot monotonic
2022-12-10T21:31:17.983552image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35801 10
 
0.3%
41862 5
 
0.2%
57422 5
 
0.2%
508602 5
 
0.2%
151293 5
 
0.2%
61247 5
 
0.2%
65417 5
 
0.2%
92957 5
 
0.2%
144323 5
 
0.2%
43511 5
 
0.2%
Other values (584) 2833
98.0%
ValueCountFrequency (%)
27348 5
0.2%
31238 4
0.1%
31456 4
0.1%
32173 3
0.1%
32397 5
0.2%
32745 5
0.2%
32763 4
0.1%
32799 5
0.2%
32807 5
0.2%
32901 5
0.2%
ValueCountFrequency (%)
4468707 5
0.2%
2012898 5
0.2%
1400541 5
0.2%
1148942 5
0.2%
826172 5
0.2%
776168 5
0.2%
748419 5
0.2%
701081 5
0.2%
661063 5
0.2%
508602 5
0.2%

total_population
Real number (ℝ)

Distinct594
Distinct (%)20.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198966.78
Minimum63215
Maximum8550405
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:18.328671image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum63215
5-th percentile67271
Q180429
median106782
Q3175232
95-th percentile618619
Maximum8550405
Range8487190
Interquartile range (IQR)94803

Descriptive statistics

Standard deviation447555.93
Coefficient of variation (CV)2.2494003
Kurtosis219.21588
Mean198966.78
Median Absolute Deviation (MAD)32640
Skewness13.044623
Sum5.7521296 × 108
Variance2.0030631 × 1011
MonotonicityNot monotonic
2022-12-10T21:31:18.683800image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
68097 10
 
0.3%
71024 10
 
0.3%
82463 5
 
0.2%
104808 5
 
0.2%
87873 5
 
0.2%
73432 5
 
0.2%
248956 5
 
0.2%
66872 5
 
0.2%
107899 5
 
0.2%
451949 5
 
0.2%
Other values (584) 2831
97.9%
ValueCountFrequency (%)
63215 5
0.2%
63651 5
0.2%
63792 5
0.2%
64609 5
0.2%
64819 4
0.1%
64837 5
0.2%
64962 5
0.2%
65052 4
0.1%
65058 5
0.2%
65065 5
0.2%
ValueCountFrequency (%)
8550405 5
0.2%
3971896 5
0.2%
2720556 5
0.2%
2298628 5
0.2%
1567442 5
0.2%
1563001 5
0.2%
1469824 5
0.2%
1394907 5
0.2%
1300082 5
0.2%
1026919 5
0.2%

number_of_veterans
Real number (ℝ)

Distinct577
Distinct (%)20.0%
Missing13
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean9367.8325
Minimum416
Maximum156961
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:18.994487image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum416
5-th percentile1990
Q13739
median5397
Q39368
95-th percentile29511
Maximum156961
Range156545
Interquartile range (IQR)5629

Descriptive statistics

Standard deviation13211.22
Coefficient of variation (CV)1.410275
Kurtosis39.853818
Mean9367.8325
Median Absolute Deviation (MAD)2281
Skewness5.2959233
Sum26960622
Variance1.7453633 × 108
MonotonicityNot monotonic
2022-12-10T21:31:19.298896image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4211 10
 
0.3%
5714 10
 
0.3%
5204 10
 
0.3%
3397 10
 
0.3%
3116 10
 
0.3%
3647 10
 
0.3%
3063 10
 
0.3%
5532 10
 
0.3%
3404 9
 
0.3%
3027 9
 
0.3%
Other values (567) 2780
96.2%
(Missing) 13
 
0.4%
ValueCountFrequency (%)
416 3
0.1%
629 5
0.2%
693 4
0.1%
705 5
0.2%
724 5
0.2%
776 4
0.1%
780 5
0.2%
897 5
0.2%
1066 4
0.1%
1101 5
0.2%
ValueCountFrequency (%)
156961 5
0.2%
109089 5
0.2%
92489 5
0.2%
85417 5
0.2%
75432 5
0.2%
72388 5
0.2%
72042 5
0.2%
71898 5
0.2%
61995 5
0.2%
54995 5
0.2%

foreign_born
Real number (ℝ)

Distinct587
Distinct (%)20.4%
Missing13
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean40653.599
Minimum861
Maximum3212500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:19.609432image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum861
5-th percentile3215
Q19224
median18822
Q333971.75
95-th percentile109222.15
Maximum3212500
Range3211639
Interquartile range (IQR)24747.75

Descriptive statistics

Standard deviation155749.1
Coefficient of variation (CV)3.8311271
Kurtosis310.38784
Mean40653.599
Median Absolute Deviation (MAD)11101
Skewness16.355795
Sum1.1700106 × 108
Variance2.4257783 × 1010
MonotonicityNot monotonic
2022-12-10T21:31:19.878312image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5757 10
 
0.3%
13409 10
 
0.3%
30908 5
 
0.2%
56514 5
 
0.2%
21959 5
 
0.2%
8948 5
 
0.2%
10599 5
 
0.2%
24503 5
 
0.2%
9257 5
 
0.2%
6630 5
 
0.2%
Other values (577) 2818
97.5%
(Missing) 13
 
0.4%
ValueCountFrequency (%)
861 5
0.2%
1058 5
0.2%
1062 4
0.1%
1224 5
0.2%
1531 5
0.2%
1699 5
0.2%
1789 5
0.2%
1815 5
0.2%
1884 5
0.2%
2064 5
0.2%
ValueCountFrequency (%)
3212500 5
0.2%
1485425 5
0.2%
696210 5
0.2%
573463 5
0.2%
401493 5
0.2%
373842 5
0.2%
326825 5
0.2%
300702 5
0.2%
297199 5
0.2%
260789 5
0.2%

average_household_size
Real number (ℝ)

Distinct161
Distinct (%)5.6%
Missing16
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean2.7425426
Minimum2
Maximum4.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:20.152489image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2.22
Q12.43
median2.65
Q32.95
95-th percentile3.58
Maximum4.98
Range2.98
Interquartile range (IQR)0.52

Descriptive statistics

Standard deviation0.43329109
Coefficient of variation (CV)0.15798883
Kurtosis2.8619082
Mean2.7425426
Median Absolute Deviation (MAD)0.24
Skewness1.4095639
Sum7884.81
Variance0.18774117
MonotonicityNot monotonic
2022-12-10T21:31:20.420576image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.4 78
 
2.7%
2.72 68
 
2.4%
2.97 55
 
1.9%
2.39 54
 
1.9%
2.64 54
 
1.9%
2.41 50
 
1.7%
2.68 50
 
1.7%
2.52 49
 
1.7%
2.73 48
 
1.7%
2.55 45
 
1.6%
Other values (151) 2324
80.4%
ValueCountFrequency (%)
2 5
 
0.2%
2.06 5
 
0.2%
2.08 10
0.3%
2.1 4
 
0.1%
2.11 5
 
0.2%
2.12 5
 
0.2%
2.13 15
0.5%
2.15 10
0.3%
2.16 9
0.3%
2.17 10
0.3%
ValueCountFrequency (%)
4.98 5
0.2%
4.78 4
 
0.1%
4.58 5
0.2%
4.57 3
 
0.1%
4.43 4
 
0.1%
4.15 5
0.2%
4.13 5
0.2%
4.08 10
0.3%
3.97 5
0.2%
3.93 5
0.2%

state_code
Categorical

Distinct49
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size22.7 KiB
CA
676 
TX
273 
FL
222 
IL
 
91
WA
 
85
Other values (44)
1544 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters5782
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMD
2nd rowMA
3rd rowAL
4th rowCA
5th rowNJ

Common Values

ValueCountFrequency (%)
CA 676
23.4%
TX 273
 
9.4%
FL 222
 
7.7%
IL 91
 
3.1%
WA 85
 
2.9%
AZ 80
 
2.8%
CO 80
 
2.8%
MI 79
 
2.7%
NC 70
 
2.4%
VA 70
 
2.4%
Other values (39) 1165
40.3%

Length

2022-12-10T21:31:20.677112image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca 676
23.4%
tx 273
 
9.4%
fl 222
 
7.7%
il 91
 
3.1%
wa 85
 
2.9%
az 80
 
2.8%
co 80
 
2.8%
mi 79
 
2.7%
nc 70
 
2.4%
va 70
 
2.4%
Other values (39) 1165
40.3%

Most occurring characters

ValueCountFrequency (%)
A 1210
20.9%
C 894
15.5%
N 425
 
7.4%
T 414
 
7.2%
L 387
 
6.7%
M 341
 
5.9%
I 339
 
5.9%
X 273
 
4.7%
O 244
 
4.2%
F 222
 
3.8%
Other values (14) 1033
17.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5782
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1210
20.9%
C 894
15.5%
N 425
 
7.4%
T 414
 
7.2%
L 387
 
6.7%
M 341
 
5.9%
I 339
 
5.9%
X 273
 
4.7%
O 244
 
4.2%
F 222
 
3.8%
Other values (14) 1033
17.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 5782
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1210
20.9%
C 894
15.5%
N 425
 
7.4%
T 414
 
7.2%
L 387
 
6.7%
M 341
 
5.9%
I 339
 
5.9%
X 273
 
4.7%
O 244
 
4.2%
F 222
 
3.8%
Other values (14) 1033
17.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5782
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1210
20.9%
C 894
15.5%
N 425
 
7.4%
T 414
 
7.2%
L 387
 
6.7%
M 341
 
5.9%
I 339
 
5.9%
X 273
 
4.7%
O 244
 
4.2%
F 222
 
3.8%
Other values (14) 1033
17.9%

race
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size22.7 KiB
Hispanic or Latino
596 
White
589 
Black or African-American
584 
Asian
583 
American Indian and Alaska Native
539 

Length

Max length33
Median length25
Mean length16.940505
Min length5

Characters and Unicode

Total characters48975
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHispanic or Latino
2nd rowWhite
3rd rowAsian
4th rowBlack or African-American
5th rowWhite

Common Values

ValueCountFrequency (%)
Hispanic or Latino 596
20.6%
White 589
20.4%
Black or African-American 584
20.2%
Asian 583
20.2%
American Indian and Alaska Native 539
18.6%

Length

2022-12-10T21:31:20.901428image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-10T21:31:21.165181image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
or 1180
15.9%
hispanic 596
8.0%
latino 596
8.0%
white 589
8.0%
black 584
7.9%
african-american 584
7.9%
asian 583
7.9%
american 539
7.3%
indian 539
7.3%
and 539
7.3%
Other values (2) 1078
14.6%

Most occurring characters

ValueCountFrequency (%)
a 6761
13.8%
i 5745
11.7%
n 5099
10.4%
4516
 
9.2%
r 2887
 
5.9%
c 2887
 
5.9%
A 2829
 
5.8%
e 2251
 
4.6%
o 1776
 
3.6%
t 1724
 
3.5%
Other values (16) 12500
25.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 37603
76.8%
Uppercase Letter 6272
 
12.8%
Space Separator 4516
 
9.2%
Dash Punctuation 584
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 6761
18.0%
i 5745
15.3%
n 5099
13.6%
r 2887
7.7%
c 2887
7.7%
e 2251
 
6.0%
o 1776
 
4.7%
t 1724
 
4.6%
s 1718
 
4.6%
l 1123
 
3.0%
Other values (7) 5632
15.0%
Uppercase Letter
ValueCountFrequency (%)
A 2829
45.1%
L 596
 
9.5%
H 596
 
9.5%
W 589
 
9.4%
B 584
 
9.3%
I 539
 
8.6%
N 539
 
8.6%
Space Separator
ValueCountFrequency (%)
4516
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 584
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 43875
89.6%
Common 5100
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 6761
15.4%
i 5745
13.1%
n 5099
11.6%
r 2887
 
6.6%
c 2887
 
6.6%
A 2829
 
6.4%
e 2251
 
5.1%
o 1776
 
4.0%
t 1724
 
3.9%
s 1718
 
3.9%
Other values (14) 10198
23.2%
Common
ValueCountFrequency (%)
4516
88.5%
- 584
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48975
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 6761
13.8%
i 5745
11.7%
n 5099
10.4%
4516
 
9.2%
r 2887
 
5.9%
c 2887
 
5.9%
A 2829
 
5.8%
e 2251
 
4.6%
o 1776
 
3.6%
t 1724
 
3.5%
Other values (16) 12500
25.5%

count
Real number (ℝ)

Distinct2785
Distinct (%)96.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48963.774
Minimum98
Maximum3835726
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 KiB
2022-12-10T21:31:21.455434image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum98
5-th percentile778.5
Q13435
median13780
Q354447
95-th percentile162670.5
Maximum3835726
Range3835628
Interquartile range (IQR)51012

Descriptive statistics

Standard deviation144385.59
Coefficient of variation (CV)2.9488247
Kurtosis246.57282
Mean48963.774
Median Absolute Deviation (MAD)12231
Skewness12.973526
Sum1.4155427 × 108
Variance2.0847198 × 1010
MonotonicityNot monotonic
2022-12-10T21:31:21.758813image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
535 3
 
0.1%
713 3
 
0.1%
1343 3
 
0.1%
876 3
 
0.1%
6547 3
 
0.1%
1615 3
 
0.1%
251 3
 
0.1%
881 3
 
0.1%
1713 2
 
0.1%
906 2
 
0.1%
Other values (2775) 2863
99.0%
ValueCountFrequency (%)
98 1
< 0.1%
128 1
< 0.1%
158 1
< 0.1%
182 1
< 0.1%
203 1
< 0.1%
204 1
< 0.1%
211 1
< 0.1%
216 1
< 0.1%
219 1
< 0.1%
227 1
< 0.1%
ValueCountFrequency (%)
3835726 1
< 0.1%
2485125 1
< 0.1%
2192248 1
< 0.1%
2177650 1
< 0.1%
1936732 1
< 0.1%
1386389 1
< 0.1%
1374535 1
< 0.1%
1304564 1
< 0.1%
1240092 1
< 0.1%
1161455 1
< 0.1%
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 KiB
Minimum2022-12-10 21:24:34.727669
Maximum2022-12-10 21:24:34.727669
2022-12-10T21:31:21.994346image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:22.153073image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2022-12-10T21:31:11.591093image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:56.651465image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:58.632519image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:00.862448image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:03.051499image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:05.044454image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:07.313442image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:09.377825image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:11.837682image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:56.885036image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:58.897412image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:01.097877image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:03.246319image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:05.308011image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:07.583647image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:09.655542image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:12.133555image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:57.127806image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:59.176315image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:01.371134image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:03.516925image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:05.591161image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:07.829167image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:09.961237image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:12.394577image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:57.344083image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:59.461913image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:01.596594image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:03.745093image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:05.840537image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:08.050180image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:10.231310image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:12.656454image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:57.576277image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:59.744422image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:01.826370image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:03.970896image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:06.092044image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:08.261025image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:10.498555image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:12.961180image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:57.837921image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:00.062184image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:02.102700image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:04.236348image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:06.394526image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:08.509514image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:10.770285image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:13.266248image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:58.087412image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:00.360355image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:02.376165image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:04.491359image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:06.657051image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:08.761560image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:11.038208image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:13.543047image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:30:58.376925image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:00.594791image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:02.611960image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:04.736496image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:06.982917image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:09.070588image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-10T21:31:11.317364image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-12-10T21:31:22.349329image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-10T21:31:22.676879image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-10T21:31:23.181195image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-10T21:31:23.459985image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-10T21:31:23.733066image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-10T21:31:23.952330image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-10T21:31:13.940251image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-10T21:31:14.407908image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-12-10T21:31:14.759743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

citystatemedian_agemale_populationfemale_populationtotal_populationnumber_of_veteransforeign_bornaverage_household_sizestate_coderacecountupdated_at
0Silver SpringMaryland33.840601.041862.0824631562.030908.02.60MDHispanic or Latino259242022-12-10 21:24:34.727669
1QuincyMassachusetts41.044129.049500.0936294147.032935.02.39MAWhite587232022-12-10 21:24:34.727669
2HooverAlabama38.538040.046799.0848394819.08229.02.58ALAsian47592022-12-10 21:24:34.727669
3Rancho CucamongaCalifornia34.588127.087105.01752325821.033878.03.18CABlack or African-American244372022-12-10 21:24:34.727669
4NewarkNew Jersey34.6138040.0143873.02819135829.086253.02.73NJWhite764022022-12-10 21:24:34.727669
5PeoriaIllinois33.156229.062432.01186616634.07517.02.40ILAmerican Indian and Alaska Native13432022-12-10 21:24:34.727669
6AvondaleArizona29.138712.041971.0806834815.08355.03.18AZBlack or African-American115922022-12-10 21:24:34.727669
7West CovinaCalifornia39.851629.056860.01084893800.037038.03.56CAAsian327162022-12-10 21:24:34.727669
8O'FallonMissouri36.041762.043270.0850325783.03269.02.77MOHispanic or Latino25832022-12-10 21:24:34.727669
9High PointNorth Carolina35.551751.058077.01098285204.016315.02.65NCAsian110602022-12-10 21:24:34.727669
citystatemedian_agemale_populationfemale_populationtotal_populationnumber_of_veteransforeign_bornaverage_household_sizestate_coderacecountupdated_at
2881GulfportMississippi35.133108.038764.0718726646.03072.02.54MSWhite428702022-12-10 21:24:34.727669
2882DavisCalifornia26.333493.034163.0676562176.013997.02.69CAAmerican Indian and Alaska Native7792022-12-10 21:24:34.727669
2883Los AngelesCalifornia35.01958998.02012898.0397189685417.01485425.02.86CABlack or African-American4048682022-12-10 21:24:34.727669
2884Mount VernonNew York38.531876.036745.0686212064.023777.02.85NYHispanic or Latino94462022-12-10 21:24:34.727669
2885LynchburgVirginia28.738614.041198.0798124322.04364.02.48VAWhite537272022-12-10 21:24:34.727669
2886StocktonCalifornia32.5150976.0154674.030565012822.079583.03.16CAAmerican Indian and Alaska Native198342022-12-10 21:24:34.727669
2887SouthfieldMichigan41.631369.041808.0731774035.04011.02.27MIAmerican Indian and Alaska Native9832022-12-10 21:24:34.727669
2888IndianapolisIndiana34.1410615.0437808.084842342186.072456.02.53INWhite5536652022-12-10 21:24:34.727669
2889SomervilleMassachusetts31.041028.039306.0803342103.022292.02.43MAAmerican Indian and Alaska Native3742022-12-10 21:24:34.727669
2890Coral SpringsFlorida37.263316.066186.01295024724.038552.03.17FLWhite908962022-12-10 21:24:34.727669